A Lower Bound for Estimating High Moments of a Data Stream

نویسنده

  • Sumit Ganguly
چکیده

We show an improved lower bound for the Fp estimation problem in a data stream setting for p > 2. A data stream is a sequence of items from the domain [n] with possible repetitions. The frequency vector x is an n-dimensional non-negative integer vector x such that x(i) is the number of occurrences of i in the sequence. Given an accuracy parameter Ω(n) < ǫ < 1, the problem of estimating the pth moment of frequency is to estimate ‖x‖p = ∑ i∈[n]|x(i)| correctly to within a relative accuracy of 1 ± ǫ with high constant probability in an online fashion and using as little space as possible. The current lower bound for space for this problem is Ω ( nǫ+nǫ/ log(n)+ (ǫ+ log(n)) ) . The first term in the lower bound expression was proved in [2, 3], the second in [6] and the third in [5]. In this note, we show an Ω(pnǫ/ log(n)) bits space bound, for Ω(pn) ≤ ǫ ≤ 1/10.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Taylor Polynomial Estimator for Estimating Frequency Moments

We present a randomized algorithm for estimating the pth moment Fp of the frequency vector of a data stream in the general update (turnstile) model to within a multiplicative factor of 1±ǫ, for p > 2, with high constant confidence. For 0 < ǫ ≤ 1, the algorithm uses space O(nǫ+ nǫ log(n)) words. This improves over the current bound of O(nǫ log(n)) words by Andoni et. al. in [2]. Our space upper ...

متن کامل

Solving the Paradox of Multiple IRR\'s in Engineering Economic Problems by Choosing an Optimal -cut

Until now single values of IRR are traditionally used to estimate the time value of cash flows. Since uncertainty exists in estimating cost data, the resulting decision may not be reliable. The most commonly cited drawbacks to using the internal rate of return in evaluatton of deterministic cash flow streams is the possibility of multiple conflicting internal rates of return. In this paper we p...

متن کامل

Estimating Most Productive Scale Size of the provinces of Iran in the Employment sector using Interval data in Imprecise Data Envelopment Analysis(IDEA)

Unemployment is one of the most important economic problems in Iran, so that many of its managers plan to increase employment rates. Increasing the employment rate needs to increase economic productivity which DEA is one of the most appropriate evaluation methods for estimating the productivity of similar organizations. Employment in the amount of data input and output can be just interval. In ...

متن کامل

Estimating Entropy and Entropy Norm on Data Streams

We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result deals with a measure we call the “entropy norm” of an input stream: it is closely related to entropy but is structurally similar to the well-studied notion of frequency moments. We give a polylogarithmic space one-pass algorithm for estimating this norm ...

متن کامل

Estimating Entropy and Entropy Norm on Data Streams by Amit

We consider the problem of computing information theoretic functions such as entropy on a data stream, using sublinear space. Our first result deals with a measure we call the “entropy norm” of an input stream: it is closely related to entropy but is structurally similar to the well-studied notion of frequency moments. We give a polylogarithmic space one-pass algorithm for estimating this norm ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1201.0253  شماره 

صفحات  -

تاریخ انتشار 2011